Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

when writing to disk bucket index, tune towards packing tighter #30761

Merged
merged 2 commits into from
Mar 17, 2023

Conversation

jeffwashington
Copy link
Contributor

Problem

see #30711
The current implementation of disk buckets (as used by accounts index on disk) was optimized for use as a hashmap with good speed in all cases.
The implementation in the validator synchronizes the in-mem hash map with the disk based one in the background.
Currently, we resize data buckets when we don't find an empty spot when starting a search at a random offset and searching for max_search, which is defaulted to approximately 32.
This max search makes sense for the index buckets where we have to exhaustively search on read and write to prove something does not exist. For data buckets, we just need to find any vacant bucket to store data. The offset will then be stored in the index bucket.

Summary of Changes

Search 10x locations before resizing disk buckets. This will result in more compact data buckets, improving performance for reads and writes. Insertions or updates with grown/shrunk slot lists can be slower, but these only happen in the background.

Fixes #

@codecov
Copy link

codecov bot commented Mar 17, 2023

Codecov Report

Merging #30761 (4d4a009) into master (62fe6ea) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##           master   #30761     +/-   ##
=========================================
- Coverage    81.6%    81.6%   -0.1%     
=========================================
  Files         723      723             
  Lines      201791   201791             
=========================================
- Hits       164863   164804     -59     
- Misses      36928    36987     +59     

Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there already validator runtime metrics/perf results with this change?

bucket_map/src/bucket.rs Outdated Show resolved Hide resolved
@jeffwashington
Copy link
Contributor Author

Are there already validator runtime metrics/perf results with this change?

image

light blue line is the validator with this change. Approx. half the # of files open (bottom graph), approx. 750M (master) vs 560M (this pr) total bytes used by data files. This means higher density, less waste to store the same data.

Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jeffwashington jeffwashington merged commit 6dd5a22 into solana-labs:master Mar 17, 2023
behzadnouri pushed a commit to behzadnouri/solana that referenced this pull request Mar 18, 2023
…na-labs#30761)

* when writing to disk bucket index, tune towards packing tighter

* switch to min
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants